Word boundary hypothesization in Hindi speech
نویسندگان
چکیده
This paper proposes a method for hypothesizing word boundaries in Hindi speech. The method is based on the observation that function words such as case markers, pronouns and conjunctions occur frequently in Hindi text and spotting of these frequently occurring patterns is proposed as a means for hypothesizing word boundaries in a speech-to-text conversion system for Hindi. Initially, the idea was tested on a correct text with all word boundaries (except sentence boundaries) removed; the results showed that nearly 67% of the word boundaries were correctly hypothesized. Later, experiments with input containing errors simulated to represent speech environment showed that the proposed method is effective even at error levels as high as 50%. The implications of these results in the development of a speech-to-text conversion system for Hindi are discussed.
منابع مشابه
Word boundary hypothesization for continuous speech in Hindi based on F0 patterns
This paper proposes an algorithm based on F, patterns to hypothesize word boundaries and function words in continuous speech in Hindi. It makes use of the properties of F, contour such as declination tendency, resetting and fall-rise patterns in Hindi. The syllabic units are identified by using the energy contour, pitch and the first order LP coefficient. Each syllabic unit is assigned an accen...
متن کاملThe use of syllable phonotactics for word hypothesization
A search technique incorporating the automatic modeling of lexical variability is introduced for medium or large-vocabulary speaker-independent speech recognition. Current state-of-art systems depend on being able to model the entire language based on acoustic features and the constraints of syntax or interword probabilities. These methods often fail in the presence of multiple speakers, new vo...
متن کاملImproved hindi broadcast ASR by adapting the language model and pronunciation model using a priori syntactic and morphophonemic knowledge
In this work, we present a new large-vocabulary, broadcast news ASR system for Hindi. Since Hindi has a largely phonemic orthography, the pronunciation model was automatically generated from text. We experiment with several variants of this model and study the effect of incorporating word boundary information with these models. We also experiment with knowledge-based adaptations to the language...
متن کاملWord segmentation in Persian continuous speech using F0 contour
Word segmentation in continuous speech is a complex cognitive process. Previous research on spoken word segmentation has revealed that in fixed-stress languages, listeners use acoustic cues to stress to de-segment speech into words. It has been further assumed that stress in non-final or non-initial position hinders the demarcative function of this prosodic factor. In Persian, stress is retract...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003